PyDigger - unearthing stuff about Python


NameVersionSummarydate
rftokenizer 2.3.0 A character-wise tokenizer for morphologically rich languages 2024-12-17 19:05:30
alphacodings 0.2.0 base26 ([A-Z]) and base52 ([A-Za-z]) encodings 2024-12-09 03:04:43
QuickBPE 2.1 A fast BPE implementation in C 2024-12-05 11:37:29
zhon 2.1.1 Zhon provides constants used in Chinese text processing. 2024-11-20 00:29:10
llama-tokens 0.0.3 A Quick Library with Llama 3.1/3.2 Tokenization - source https://github.com/jeffxtang/llama-tokens 2024-11-10 17:03:39
eKoNLPy 2.0.6 A Korean natural language processing toolkit for economic analysis 2024-11-02 00:02:54
huspacy 0.12.0 HuSpaCy: industrial strength Hungarian natural language processing 2024-10-28 10:30:55
miditok 3.0.4 MIDI / symbolic music tokenizers for Deep Learning models. 2024-09-15 10:43:00
maze-dataset 1.1.0 generating and working with datasets of mazes 2024-09-10 19:33:49
taibun 1.1.7 Taiwanese Hokkien Transliterator and Tokeniser 2024-08-31 20:25:01
bpeasy 0.1.3 Fast bare-bones BPE for modern tokenizer training 2024-08-23 10:47:52
simplemma 1.1.1 A lightweight toolkit for multilingual lemmatization and language detection. 2024-08-08 12:20:45
textmate-grammar-python 0.6.1 A lexer and tokenizer for grammar files as defined by TextMate and used in VSCode, implemented in Python. 2024-07-31 18:43:24
process-twarc 0.20.2 Tools for transforming raw data from Twarc2 to structured data for Masked Language Modeling. 2024-06-12 11:40:55
example990420 1.1.1 Taiwanese Hokkien Transliterator and Tokeniser 2024-05-01 20:28:38
hourdayweektotal
2010789548274518
Elapsed time: 1.70576s